INDEX
Explanations
instances of the word "acting" or related forms
New Auto-Interp
Negative Logits
im
-0.17
aggi
-0.15
Rosenstein
-0.15
Bucc
-0.14
Im
-0.14
es
-0.14
esen
-0.14
adata
-0.14
Eag
-0.14
incon
-0.14
POSITIVE LOGITS
renom
0.17
rema
0.16
ahu
0.15
902
0.15
premi
0.15
ossa
0.15
FG
0.14
@nate
0.14
æĶ
0.14
nem
0.14
Activations Density 0.005%