INDEX
Explanations
questions or statements about general knowledge or information
phrases related to conveying information or experiences
New Auto-Interp
Negative Logits
swer
-0.73
estone
-0.70
idon
-0.67
raid
-0.67
robe
-0.66
odge
-0.66
xon
-0.64
scan
-0.64
ivery
-0.64
etts
-0.63
POSITIVE LOGITS
happens
1.48
constitutes
1.42
happened
1.25
transpired
1.17
separates
1.17
distinguishes
1.14
motiv
1.10
kinds
1.09
qualifies
1.07
makes
1.05
Activations Density 0.095%