INDEX
Explanations
descriptions of the size, weight, and construction materials of objects
phrases that refer to objects or devices
New Auto-Interp
Negative Logits
911
-0.66
dding
-0.65
course
-0.64
Corpus
-0.64
traumatic
-0.63
castle
-0.62
Guant
-0.62
priv
-0.62
Union
-0.61
Priv
-0.60
POSITIVE LOGITS
unes
1.03
chy
1.01
seems
1.00
alian
1.00
'll
0.99
self
0.98
theless
0.96
doesnt
0.90
's
0.89
appears
0.88
Activations Density 0.282%