INDEX
Explanations
mentions of notable people's names along with either positive or negative attributes
assertions about notable figures or their attributes
New Auto-Interp
Negative Logits
urry
-0.72
ear
-0.61
âĵĺ
-0.59
Else
-0.59
IFF
-0.58
Cou
-0.57
âĶĢâĶĢ
-0.57
occurs
-0.56
AGES
-0.56
ARM
-0.56
POSITIVE LOGITS
indeed
1.06
certainly
1.02
undoubtedly
0.99
undeniably
0.99
notoriously
0.97
fundamentally
0.93
unquestion
0.90
understandably
0.86
profoundly
0.86
hardly
0.85
Activations Density 0.562%