INDEX
Explanations
references to academic articles or authors
New Auto-Interp
Negative Logits
Mejía
-0.99
Monfieur
-0.87
itſelf
-0.81
myſelf
-0.76
Cárdenas
-0.75
ſon
-0.74
Guimarães
-0.73
fubject
-0.72
ſch
-0.72
ſta
-0.71
POSITIVE LOGITS
JIM
0.46
del
0.46
Gea
0.45
Jof
0.44
River
0.44
fvar
0.44
Herr
0.42
jim
0.42
JIM
0.41
Mate
0.40
Activations Density 0.290%