INDEX
Explanations
instances of names, specifically focusing on proper nouns and entities
New Auto-Interp
Negative Logits
behav
-0.76
disadvant
-0.71
distingu
-0.70
misunder
-0.70
Ender
-0.69
escape
-0.65
Reply
-0.65
independ
-0.64
AB
-0.64
Ichigo
-0.64
POSITIVE LOGITS
EStreamFrame
0.98
milo
0.92
ForgeModLoader
0.90
Ń·
0.86
ola
0.85
Plaza
0.81
810
0.81
ilogy
0.80
TAMADRA
0.80
cci
0.79
Activations Density 0.143%