INDEX
Explanations
phrases indicating a list of multiple items or actions
references to general concepts or items within a text
New Auto-Interp
Negative Logits
Gaza
-0.63
ardo
-0.63
NAS
-0.62
oku
-0.61
bern
-0.60
inav
-0.60
irl
-0.60
NES
-0.60
Coul
-0.60
CVE
-0.59
POSITIVE LOGITS
happened
0.95
happens
0.94
happening
0.93
happ
0.86
transpired
0.86
happen
0.85
pertaining
0.79
occurring
0.78
imaginable
0.75
worldly
0.73
Activations Density 0.033%