INDEX
Explanations
words related to politics and international relations
New Auto-Interp
Negative Logits
bere
-0.59
Jasper
-0.59
Cullen
-0.59
sson
-0.58
EntityItem
-0.58
Edited
-0.57
Prohibition
-0.56
incred
-0.56
Philips
-0.56
succeeding
-0.56
POSITIVE LOGITS
shaped
1.32
series
1.06
dimensional
0.98
bomb
0.96
factor
0.92
FU
0.92
level
0.91
Unit
0.91
word
0.90
section
0.90
Activations Density 0.040%