INDEX
Explanations
terms related to specific car models
mentions of specific academic or institutional names
New Auto-Interp
Negative Logits
EFF
-0.71
APPLIC
-0.68
accomp
-0.67
artif
-0.65
enthusi
-0.62
begging
-0.61
cousin
-0.61
epis
-0.61
misdem
-0.60
framing
-0.59
POSITIVE LOGITS
isoft
0.93
ampire
0.91
endish
0.89
idable
0.88
netic
0.85
yang
0.85
ixt
0.84
raid
0.82
letal
0.80
auld
0.80
Activations Density 0.187%