INDEX
Explanations
proper nouns with initials or abbreviations
capital letters, particularly of names or titles
New Auto-Interp
Negative Logits
ModLoader
-0.81
dylib
-0.75
Topics
-0.71
FACE
-0.68
culosis
-0.68
Relations
-0.67
20439
-0.67
Contents
-0.66
duino
-0.65
âĶĢâĶĢ
-0.64
POSITIVE LOGITS
itte
0.78
ahn
0.74
iner
0.73
uty
0.70
ze
0.70
oust
0.69
oner
0.69
utsch
0.67
erman
0.67
ham
0.66
Activations Density 0.257%