INDEX
Explanations
Mentions of specific names, locations, and titles related to various individuals and places
geographic locations and names
New Auto-Interp
Negative Logits
OPLE
-0.62
staking
-0.58
envy
-0.56
puzz
-0.51
enegger
-0.45
WARE
-0.44
SHIP
-0.44
neurot
-0.43
ACTED
-0.42
etheless
-0.41
POSITIVE LOGITS
oz
0.57
ani
0.53
uer
0.53
ali
0.53
ida
0.53
oka
0.53
am
0.52
alla
0.52
ona
0.52
aman
0.51
Activations Density 0.600%