INDEX
Explanations
names of people or characters in a text
pronouns and their associated forms in various contexts
New Auto-Interp
Negative Logits
̶
-0.65
consolidation
-0.65
eatures
-0.62
drivers
-0.61
corros
-0.60
%]
-0.59
critical
-0.57
steamapps
-0.56
unaff
-0.56
norm
-0.56
POSITIVE LOGITS
neau
0.87
oya
0.80
Pradesh
0.76
nikov
0.75
chuk
0.73
orf
0.73
wu
0.73
Brothers
0.73
ofer
0.73
ippi
0.72
Activations Density 0.240%